home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Monster Media 1996 #15
/
Monster Media Number 15 (Monster Media)(July 1996).ISO
/
os2
/
unh202.zip
/
UNH202.TXT
< prev
Wrap
Text File
|
1996-04-25
|
3KB
|
68 lines
OS/2 Upload Information Template for ftp-os2.nmsu.edu
Archive Name: UNH202.ZIP
Program Description: a command line utility to strip HTML codes
Operating System Versions: OS/2 2.x and later
Program Source: Don Hawkinson, author
Replaces: UNH175.ZIP UNH150.ZIP
Your name: Don Hawkinson
Your email address: dwhawk@southwind.net
Proposed directory for placement: ./os2/textutil
This is an OS/2 command line utility to strip HTML codes from
files saved from the WebX or other web browsers.
UNH 2.02 HTML stripper by Don Hawkinson dwhawk@southwind.net
usage: ..\unh file1 file2 <file3>
file1 == html file
file2 == stripped text output file
file3 == URLs from html source file - optional
UNH does not check for the existance of the output
file, and will overwrite any existing file. UNH
is HPFS aware.
UNH does not attempt to recreate the format of the
Web page. UNH does not attempt to force any format
on the output text, nor does it attempt to remove any existing
text format. While the layout of tables and lists is lost
during stripping, data is sorted to separate lines for
legibility.
The HTML specification defines Character Entity Sets or tags
to represent particular graphic characters which have special
meanings in places in the markup, or may not be part of the
character set available to the writer. UNH does not attempt
to scan for all of the possible tags, but does try to resolve
the most common tags.
This version of UNH has support for codepages 437 and 850
and if codepage 850 is in use, the 850 character set is used.
The codepages only make a difference when &xxxx; tags are
present in the file. If the correct character or an acceptable
alternate is not available, then the &xxxx; tag will be left
in the file.
Only a few of the &#nnn; tags are supported. They do not seem to
be widely used and scanning for all of them will increase the time
it takes to process an .HTML or .HTM file.
If an unrecognized tag is encountered, it is left in the output text.
This version should be useable under OS/2 2.1, but it has not been
tested. The special compression option for OS/2 Warp was not used
when linking the executable.
This program is free, but the author retains all rights. See the file
license.txt file for further information.
The command line utility UNH.EXE uses the same logic as PMStripper
to strip the HTML codes from files. For information on PMStripper
contact send email to dwhawk@southwind.net .